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—  a)|l 

Theory    of    Correlation    among    Constituents  . 

1.     Introduction.       Consider  a  material  body 
which  is  composed  of  a  number  of  constituents.     To  be  concrete,  a 
body  may  contain  chemical  constituents  such  as  protein,  oil,  ash 
and  carbo-hydrates  in  certain  quantities.     It  is  known,   in  some 
cases,   that  such  constituents  fluctuate  in  value  from  one  individ- 
ual to  another  of  the  same  class.     Let  A,  ,  A,,  ,  A    be  a 

class  of  such  individuals.     The  question  naturally  arises,  upon 
selectinp  at  random  another  body  A^  from  this  class  in  which  one 
constituent  is  relatively  high,  as  to  whether  the  other  constitu- 
ents shall  be  relatively  high  or  relatively  low.     The  discussion 
of  this  question  of  associated  fluctuations  of  constituents  leads 
to  v/hat  v/e  have  termed  the  theory  of  correlation  among  constituents. 

A  constituent  m.a,y  be  considered  from  either  of  t¥;o  view- 
points.    In  the  first  place,  merely  the  absolute  values  of  the 
constituents  need  be  considered,  and  the  resulting  correlation 
co-efficients  employed  as  the  basis  of  an  investigation.     On  the 
other  hand,   the  constituents  m.ay  be  viewed  as  proportional  parts 
of  the  entire  complex,   in  which  case  the  s"um  of  the  constituents  is  ' 
unity. 

The  latter  viewpoint  is  chosen  as  the  one  upon  vrhich  to 
base  the  present  investigation,   since  it  permits  the  constituents 
to  be  viewed  as  frequency  groups  whose  sum  is  constant  and  equal 
to  unity.     Prof.  Karl  Pearson  has  made  a  study  of  the  correlations 
existing  between  frequency  groups  whose  sum  is  constant  and  his 
article   furnished  valuable  suggestions  by  means  of  which  to  begin 


I  ^  (2) 

the  investigation  of  the  problem  which  we  have  set, 

2.     Assumption  of  random  sampling.     Suppose  that  a 
substance  is  composed  of  n  constituents,  v  ,  v  ,   y  .  Then 

 +  y^=  1- 

Moreover,  let  it  he  assumed  that  the  law  of  random  sampling  is  in 
operation.     Then,   if  an  increase  is  assigned  to  one  constituent, 
the  most  probable  manner  in  which  the  resulting  decrease, (  which 
must  occur  among  the  remaining  constituents  taken  together  as  a 
whole   ),  will  be  distributed  among  these  constituents  is  in  propor- 
tion to  their  respective  values. 

If  the  correlation  between  two  constituents,        and  ^  ,  is 
desired,  we  may  assign  a  variation  to  y^,  .     As  a  result,  ^  will 
suffer  a  variation  opposite  in  sign  to  that  of  y^ ,  and  such  that 

Then  ^  5.  • 


Summing  for  all  samplings, 
or 

^    I  -  y 

Again,  we  may  write  ''^ 


(1) 


"On  the  Probable  Error  of  Frequency  Constants." 
Biometrika,  Vol.  2-  P.  275. 


Then  j. 

Sucming  for  all  saicplinf;s,  and  dividing  by  the  nijmber  of  them. 

Since  in  obtaining  the  relations   (1)  and  (2),  the  total  variation 
caused  in  either  of  these  two  constituents  by  a  change  in  the  other 
has  been  considered,  the  two  correlation  co-efficients  are  equal. 
Therefore, 


c     ,  r, —       •  -la 


from  which  it  is  evident  that 


(3) 


or 


<r.  =  ><T/v.u-y^]     and        «;=  kVx(i-%^ 


(4) 


Substituting  these  values  in  (1)  and  (2),  respectively, 


from  which  it  appears  that  all  the  correlation  co-efficients  are 
negative • 

3.     Special  cases .     If  there  are  only  two  constitu- 
ents in  a  substance,  then 


That  is,  thesis  a  perfect  negative  correlation,  no  matter  in  what 
proportions  the  tv;o  constituents  enter  into  the  complex.     If  a 


(4) 

larger  number  of  constituents  is  considered  such  a  relation  no 
lonf^er  exists.     There  are,  however,  several  interesting  special 
cases  that  should  be  mentioned. 

It  is  evident  from  relations   (4^  and  (5)  that  if  the  con- 
stituents are  practically  equal  that  the  standard  deviations  are 
equal,  and  that  all  of  the  correlation  co-efficients  are^like-wise^ 
equal.     That  is,  if  each  of  the  constituents  fluctuates  very  closely 
about  the  mean  value  then 


This  reduces,  for  the  case  of  tv;o  constituents,  to  -1;  for  three 
equal  constituents  to  -\i  and  so  on. 

Aioreover,  if  the  n  constituents  are  so  related  that  the 
product  of  two  of  them  divided  by  the  sum  of  the  remaining  constit- 
uents is  constant,  then  the  correlation  between  the  first  two  re- 
mains constant.     For  exajnple,  if 

then 


X  Xa 


'  -  y,-y^  +  / 


lb 


(r ) 


Again,  if  the  sum  of  n-S  constituents  is  constant,  then  the  maximum 
correlation  betv/een  the  remaining  two  results  for  equal  values  of 


(5) 

these  two  constituents,  and  a  minimum  v/hen  either  of  them,  is  pract- 
ically equal  to  zero.     For,   if  the  sum  of  the  n-2  constituents  be 
represented  by  a,  then 


(9)  i 

The  expression  under  the  radical  sign  is  a  maximum  for  y,  =  yg_ ,  and 
a  minimum  for  each  practically  equal  to  zero.  j, 

4»     Numerical  cases .     At  this  point  correlation 
tables  were  constructed  and  the  correlation  co-efficients  computed 
for  the  case  of  a  substance  composed  of  five  constituents.  These 

constituents  are  represented  in  the  tables  by  the  numbers  1,2,5,4, 
anc  5.     (See  Pp.^?-  22  ).   Examination  of  these  tables  show  that  the  I' 
correlation  co-efficients  are  not  all  negative  as  is  the  case  under 

the  assumption  of  random  sampling.     Arranging  the  co-efficients  in 

two  columns,  and  placing  the  values  of  the  correlation  co-efficient 

obtained  from  the  tables  in  the  first  column  and  those  obtained 

from  the  formula 


in  the  second;  a  large  discrepancy  appears. 


The  data  employed  in  the  tables  was  furnished  through  the  courtesy 
of  Dr,  H,  S.  Grindley,  and  are  the  results  of  a  series  of  experi- 
ments conducted  by  the  Laboratory  of  Physiological  Chemistry  of 
the  Departm.ent  of  Animal  Husbandry  -  which  are,  as  yet,  unpublished. 


-0.125  -0.467 

-0.508  -0.574 

-0.135  -0.095 

-0.815  -0.488 

4-0.144  -0.555 

+0.164  -0.028 

-0.143  -0.045 

+0.3p3  -0.0c4 

_0.008  -0.056 

-0.091  -0.029 


•r- 

*I5 


^14 


r. 


^55 


^45 

This  discrepancy  may  be  explained  by  calling  attention  to  the  fact 
that  when  the  absolute  weights  of  the  constituents,  which  had  been 
chosen  at  random,  were  divided  by  the  absolute  weights  of  their  re- 
spective samples  that  a  set  of  values  v;as  obtained  which  did  not 
exhibit  random  sampling.     Prof.  Karl  Pearson  has  attributed  this 
discrepancy,  which  occurs  when  indices  are  formed  from  absolute 
values,  to  what  he  terms  "spurious  correlation".     It  is  evident, 
therefore,  that  theory  of  correlation  among  constituents  must  be 
developed  on  come  assum.ption  other  than  that  of  random  sampling. 
It  may  be  developed  upon  a  purely  mathematical  basis  from  the  m.ath- 
ematical  definition  of  a  correlation  co-efficient  and  the  assumption 
that  the  sum.  of  the  constituents  is  always  unity. 

5.     Mathematical  theory.     If  the  deviations  of  any 
two  constituents  from  their  mean  values  be  represented  by  Y,    and  , 
respectively,  then  a  mathematical  definition  of  a  correlation  co-ef- 


ficient is  given  by 


whe 


(7) 

Suppose  that  a  substo.nce  is  composed  of  n_  constituents  , 
\  '  >  \  '  '^h®^® 

Now,   if  the  means  of  these  constituents  be  represented  by  ^  ,  3^  , 

 ,  respect ively,  v/e  may  transform  co-ordinates  to  a 

new  system  v/hich  has  its  origin  at  the  means  of  the  n  constituents. 

If  the  deviations  from  these  means  be  denoted  by  Y.  ,Y.  »  X  *  Y  , 

 , respectively,  then 

Y,  ^y,  *\^Y^^-*\n^*—*\n^^  —  ^\^\=^' 

But 

ft  ^-ysLt---*-^  t----^^  ♦---*yn=i 

since  it  is  sum  of  the  mean  values  of  values  v/hose  sum  is  constantljl: 

I 

equal  to  1.     Now,  by  definition 

r,,=  £a  =  2\t-X-v,-  -VY..,--^»)i: 


 1%?  '^.s=r  ^ --VV 


<5 


from  which 


1 


  (8) 

Likewise,  if  the  value  of        be  substituted  in  ll  ^  Y,  the  re- 

lation becomes  "^^^^S 


That  is,  in  the  case  of  n  constituents,  the  n(n-l )  correlation  co- 

a. 

efficients  satisfy  the  n  relations 

(10) 


6.     Special  cases.     In  the  case  of  tv/o  constituents 


these  relations  take  the  form 


or 


from  which 


c  r  =  -  sr. 
'  ai  2. 

2.       -  • 


1  ^2.      and      *^  ^  ^  —  ^ 

*  ^  CT, 

I  2- 


or  ^2.  ^  S 


and 


which  agrees  with  the  result  obtained  under  the  assumption  of  ran- 
dom- sampling. 

In  the  case  of  three  constituents 


(9) 


froK  which 


/  e.  —2.  


r,~.  /  (12) 


»5 


In  the  case  of  four  or  more  constituents,  it  is  impossible 
to  solve  the  n  independent  equations  for  the  n (n-l )  correlation  co- 
efficients  in  terms  of  the  standard  deviations  alone.     That  is,  the 
values  of  each  and  every  correlation  co-efficient  depends  upon  the 
values  of  certain  other  correlation  co-efficients.     For  instance, 
for  n=4,  there  are  six  correlation  co-efficients  and  only  four  re- 
lations by  means  of  which  to  determine  them;  for  n  =  5,  there  are 
only  five  relations  by  means  of  which  to  determine  ten  correlation 
cc-el'f  icients , 

Returning  to  (12)  it  is  rather  interesting  to  note  that  if 
the  values     CT  -  ^l/y,(  l~y,)       ;  (T^^  \  1/  Vg  (  I  - y^^  and 


which  were  obtained  under  the  assumption  of  random  sampling^be  sub- 
stituted in  relations  (12),  then 


)i>-y^) 


t'-y,)n-y3) 


(10)1 

which  values  are  identical  with  those  obtained  under  the  assurrption 

of  random  sampling. 

Moreover,  if  in  (10)  we  assume  all  the  standard  deviations 

to  be  equal,  all  the  va.lues  of  the  correlation  co-efficients  are 

equal  to  -l        ,  which  is  the  value  obtained  in  Art.  3  under  the 
n-1 

assumption  of  random  sampling:  for  equal  values  of  the  constituents 
and  the  resulting  equal  ^'■alues  of  the  standard  deviations. 

7,     Signs  of  the  correlation  co-efficients .  From 
(10),  since  the  standard  deviations  are  always  taken  positively, the 
apparent  conclusion  is  that  all  of  the  correlation  co-efficients 
may  be  negative.     This  may  be  proven  in  the  following  manner; 

Let  the  n  constituents  of  a  substance  be  represented  by 
y^        >  •     Suppose  that  these  constituents  in  a  certain  indi- 
vidual take  the  definite  values  a,  ,  a^,  a^^,  respectively,  and 

that  these  values  have  been  permuted.     In  other  words,  '-e  have  Ln_ 
samples,  such  that  their  constituents  satisfy  the  above  conditions. 
Each  value  is,  therefore , assigned  to  the  y^  -constituent  \n-l  times 

so  that  the  mean  of  the      -constituent  is  jn-]  (a,  -t-a^-v  -va^  ) 

LiL_ 

But 

^+\+  ^ 

so  that  the  mean  of  the  of  tho  y,  -  constituent  is, therefore,  1 

n  • 

The  mean  of  each  constituent  is , therefore ,  1   ^since  the  constituents 

n 

are  identical  in  sum. 

The  correlation  betv/een  any  two  constituents^  which  is  given 
by  '^^^i  2Cy^-~)[Yi-i^j  /vill  have  the  same  sign  as    2  CyV*;,^(y4,- ~^ 
-^^^ 

since  all  values  involved  are  absolute  values.     On  account  of  the 


■  -        '  ( 111' 

fact  that  the  distribution  of  each  constituent  has  been  built  up  by 
permuting  the  n  constituents  n  at  a  time,  the  sum.  of  the  product  ^ 
deviations  in  each  of  the  n(n-l)  different  cases  is  the  same.  Ii 
Therefore,  if  any  one  of  the  correlation  co-efficients  is  shown  to 
be  negative,  then  all  of  them  are  equal  and  negative  in  sign.  j 

Let  us  consider  the  y,  -  and  5^ -constituents .     Let  re- 
main fixed  as  the  y,  -constituent ,  then  a  permutation  of  the  n-1  re- 
maining values  will  cause  each  of  them  to  be  the  3^ -constituent 
)  n-2     times.     Then  for  all  samples  foi,  v/hich  a^    is  the  jj- constituent 

I 

)!    the  product  deviations  will  add  up  to  | 

\  n-g(e-  l)(a.-l  -a,-  1  3.-  1) 

n      ^  TT      ^    n  ^  n 


Likewise^ the  sum  of  the  product  deviations  for  all  cases  for  v;hich 

B.^  is  the  y^  -constituent  is  given  by  -  \  n-2  (a^-  _1). 

n 

Therefore, 

is  negative  in  sign  and, therefore, all  of  the  correlation  co-effici- 
ents are  negative.  |, 
Again,  from.  (10)  it  is  evident  that  all  of  the  correlations 
co-efficients  can  not  be  positive.     Moreover, it  is  evident  that  the 
correlations  of  each  constituent  with  each  of  the  remaining  consti- 
tuents ca.n  not  all  be  positive.     If  the  various  groups  of  correla- 
tion co-efficients  be  written  dovm  which  contain  the  correlations 
of  each  constituent  with  all  the  remaining  constituents,  there  re- 
sults n  such  groups,  each  containing  n-1  correlation  co-efficients.: 
At  least, one  correlation  co-efficient  in  each  group  must  be  negative 


(is: 

and  since  each  correlation  co-efficient  appears  in  two, and  only 
tViTO, groups  there  can  not  be  less  than  n  negative  correlation  co-ef- 
ficients  if  n  is  even,  and  not  less  than  nf 1  negative  correlation 
co-efficients  if  n  is  odd, 

8,     Summary.     (a)  Under  the  assumption  of  random 
sampling  a  theory  is  developed  from  which  all  the  correlation  co-ef4 

ficients  connecting  constituents  are  negative.  j 

j 

(b)  Computation  of  the  correlation 
co-efficients  from  tables  by  the  usual  method  give  values  which  do 
not  arise  under  the  assumption  of  random  sampling, 

(c)  A  mathematical  theory  is  developed 
from  the  mathematical  definition  of  a  correlation  co-efficient 
which  is  consistent  with  computed  values  of  the  correlation  co-effi(jl 
ients.  For  the  case  of  two  constituents  or  for  equal  standard  devi- 
ations, the  results  agree  with  those  obtained  from  the  assumption  of 
random  sampling  as  made  in  Art.  2. 

(d)  All  of  the  correlation  co-effici- 
ents may  be  negative. 

(e)  All  of  the  correlation  co-effici- 
ents can  not  be  positive. 

(f )  If  n  is  even, not  less  than  n  of 

8 

the  correlation  co-efficients  can  be  negative;   if  n  is  odd  not  less 
than  n-f  1  of  the  correlation  co-efficients  can  be  negative. 
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